PRIS at Chinese Language Processing
نویسندگان
چکیده
The more Chinese language materials come out, the more we have to focus on the “same personal name” problem. In our personal name disambiguation system, the hierarchical agglomerative clustering is applied, and named entity is used as feature for document similarity calculation. We propose a two-stage strategy in which the first stage involves word segmentation and named entity recognition (NER) for feature extraction, and the second stage focuses on clustering.
منابع مشابه
Mainland Chinese Students’ Shifting Perceptions of Chinese-English Code-Mixing in Macao
As a former Portuguese colony, Macao is the only region in China where Cantonese, a variety of Chinese, and English, an international language, are enjoying de facto official statuses, with Putonghua being a quasi-official language and Portuguese being another official language. Recently, with an increasing number of Mainland Chinese students crossing the border to pursue their tertiar...
متن کاملDetermination of 5-Hydroxymethyl-2-furaldehyde of Crude and Processed Fructus Corni in Freely Moving Rats Using In Vivo Microdialysis Sampling and Liquid Chromatography
The purpose of this study was to develop a sensitive and fast microdialysis coupled with high-performance liquid chromatographic (HPLC) method for determination of 5-hydroxymethyl-2-furaldehyde (5-HMF) in free-moving rats after i.g. administration of the aqueous extract of crud Fructus corni and its processed products of jiuzheng pin (JZP). The concentration of 5-HMF in free-movi...
متن کاملWhat You Need to Know about Chinese for Chinese Language Processing
The synergy between language sciences and language technology has been an elusive one for the computational linguistics community, especially when dealing with a language other than English. The reasons are two-fold: the lack of an accessible comprehensive and robust account of a specific language so as to allow strategic linking between a processing task to linguistic devices, and the lack of ...
متن کاملResearch on Chinese discourse rhetorical structure representation scheme and corpus annotation
It is well-known that interpretation of a text requires understanding of its rhetorical relation hierarchy since discourse units rarely exist in isolation. Such discourse structure is fundamental to document-level applications, such as text understanding, summarization, knowledge extraction and question-answering. In comparison with English, there are only a few studies on Chinese discourse ana...
متن کاملHMM and CRF Based Hybrid Model for Chinese Lexical Analysis
This paper presents the Chinese lexical analysis systems developed by Natural Language Processing Laboratory at Dalian University of Technology, which were evaluated in the 4th International Chinese Language Processing Bakeoff. The HMM and CRF hybrid model, which combines character-based model with word-based model in a directed graph, is adopted in system developing. Both the closed and open t...
متن کامل